Predicting Students’ Chance of Admission Using Beta Regression

Marcus Chery and Keiron Green

2023-11-19

Introduction

Beta regression is a type of statistical analysis used for modeling dependent variables that are bounded on both sides, typically between 0 and 1. It is particularly useful for variables that represent proportions or percentages.

  • Definition: A statistical technique for modeling data that follows a beta distribution.
  • Key Characteristics:
    • Deals with continuous variables bounded between 0 and 1.
    • Suitable for rates or proportions.
  • Uses:
    • Ideal for modeling rates and proportions in finance, biology, and social sciences.
  • Importance:
    • Provides more flexibility and accuracy for bounded data compared to traditional regression.
  • Mechanics:
    • Models the effect of predictors on the mean of a beta-distributed response variable.
    • Incorporates a link function, similar to logistic regression.
    • Estimation typically done through maximum likelihood methods.

By Pabloparsil — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=89335966

Methods

Data

Data: University Admission Data

Attribute Information

Variable Parameter Range Description
GRE Scores gre_score

290 - 340

(340 scale)

Quantifies a candidate’s performance on the Graduate Record Examination, with a maximum score of 340
TOEFL Scores to efl_score

92 - 120

(120 scale)

Measures English language proficiency, scored out of a total of 120 points
Un iversity Rating universi ty_rating 1 to 5 with 5 being the highest rating Rates universities on a scale from 1 to 5, indicating their overall quality and reputation.
S tatement of Purpose (SOP) Strength sop 1 to 5 with 5 being the highest rating Evaluates the strength and quality of a candidate’s SOP on a scale of 1 to 5
Letter of Reccomm endation (LOR) Strength lor 1 to 5 with 5 being the highest rating Evaluates the strength and quality of a candidate’s SOP and LOR on a scale of 1 to 5
Under graduate GPA cgpa

6.8 - 9.92

(10.0 scale)

Reflects a student’s academic performance in their undergraduate studies, scored on a 10-point scale
Research Ex perience research 0 or 1 Indicates whether a candidate has research experience (1) or not (0).
Chance of Admit chance
_of_admit

0.34 - 0.97

(0 to 1 scale)

Represents the likelihood of a student being admitted, expressed as a decimal between 0 and 1

Libraries

Load necessary packages for analysis and modeling.

Loading Dataset

# A tibble: 5 × 9
  serial_no gre_score toefl_score university_rating   sop   lor  cgpa research
      <dbl>     <dbl>       <dbl>             <dbl> <dbl> <dbl> <dbl>    <dbl>
1         1       337         118                 4   4.5   4.5  9.65        1
2         2       324         107                 4   4     4.5  8.87        1
3         3       316         104                 3   3     3.5  8           1
4         4       322         110                 3   3.5   2.5  8.67        1
5         5       314         103                 2   2     3    8.21        0
# ℹ 1 more variable: chance_of_admit <dbl>

Chance of Admit and TOEFL

Based on exploratory data analysis, TOEFL scores appear to be associated with a greater chance of admission, this chance of admission is further augmented by higher university ratings, and research experience appears to be a strong factor in increasing both TOEFL scores and the likelihood of admission.

The correlation coefficient of 0.791594 between TOEFL scores and the chance of admit suggests a strong positive relationship. This indicates that as TOEFL scores increase, the chance of admission tends to increase as well.

This trend indicates that applicants to higher-rated universities generally have a higher chance of admission. The standard deviation decreases as the university rating increases, suggesting more consistency in admission chances at higher-rated universities.

Applicants with research experience (Research = 1) have a higher average TOEFL score (approx. 110) compared to those without research experience (Research = 0), who have an average TOEFL score of about 104.

The mean chance of admission for applicants with research experience is significantly higher (approx. 0.796) than for those without (approx. 0.638).

This data suggests that research experience is positively associated with both higher TOEFL scores and a greater likelihood of admission.

Chance of Admit and GRE

These analyses suggest that higher GRE scores are strongly correlated with an increased chance of admission. The likelihood of admission also appears to be influenced by the university rating and is further enhanced by research experience.

A correlation coefficient of 0.8026105 indicates a strong positive relationship between GRE scores and the chance of admission. This suggests that higher GRE scores are generally associated with a higher likelihood of being admitted.

This trend suggests that applicants to higher-rated universities have a higher chance of admission, with the chance of admission being most favorable at the highest-rated universities.

Applicants with research experience (Research = 1) have a higher average GRE score (about 323) compared to those without research experience (Research = 0), who have an average GRE score of approximately 309. Similarly, the mean chance of admission is significantly higher for applicants with research experience (approx. 0.796) than for those without it (approx. 0.638). This indicates that research experience is positively associated with both higher GRE scores and a greater likelihood of admission.

Chance of Admit and CGPA

Applicants with a higher g.p.a have a higher acceptance probability as the university ranking goes from low(1) to high(5). The correlation value of 0.87 indicates a strong positive relationship between G.P.A score and the chance of admission. This suggests that higher a G.P.A score are generally associated with a higher likelihood of being admitted.

Chance of Admission Correlation Heatmap

The heatmap shows that GRE scores, TOEFL scores, and CGPA are strongly and positively correlated with the chance of admission. Research experience also positively influences admission chances, albeit to a lesser extent than academic scores.

Analysis and Results

Fitting Full Model

[1] "Coefficients (Mean Model):"
                  Estimate Std. Error  z value Pr(>|z|)
(Intercept)        -9.7358     0.7785 -12.5066   0.0000
gre_score           0.0084     0.0035   2.4222   0.0154
toefl_score         0.0190     0.0065   2.9333   0.0034
university_rating   0.0481     0.0301   1.5991   0.1098
sop                -0.0577     0.0333  -1.7329   0.0831
lor                 0.1233     0.0358   3.4438   0.0006
cgpa                0.6561     0.0699   9.3840   0.0000
research            0.1499     0.0467   3.2080   0.0013

Log-Likelihood: 408.75 
Pseudo R-squared: 0.8275

Choosing Best Fit Model

  Size
1    1
2    2
3    3
4    4
5    5
6    6
7    7
                                                                         Variables
1                                                                (Intercept), cgpa
2                                              (Intercept), gre_score, toefl_score
3                                                (Intercept), gre_score, lor, cgpa
4                                      (Intercept), gre_score, lor, cgpa, research
5                 (Intercept), gre_score, toefl_score, university_rating, sop, lor
6           (Intercept), gre_score, toefl_score, university_rating, sop, lor, cgpa
7 (Intercept), gre_score, toefl_score, university_rating, sop, lor, cgpa, research
  R.Squared Adj.R.Squared       BIC
1 0.7626339     0.7620375 -563.2776
2 0.6925050     0.6909559 -453.7442
3 0.7941207     0.7925610 -608.2202
4 0.7986651     0.7966263 -611.1569
5 0.7504324     0.7472653 -519.2615
6 0.7987119     0.7956388 -599.2669
7 0.8034714     0.7999619 -602.8472
  nvmax       RMSE  Rsquared        MAE      RMSESD RsquaredSD       MAESD
1     1 0.06889355 0.7635275 0.05110306 0.009641911 0.06007377 0.007013934
2     2 0.07889825 0.6984116 0.06031014 0.013609817 0.06846210 0.010474096
3     3 0.06452847 0.7965067 0.04693197 0.009982503 0.05017603 0.007105953
4     4 0.06692766 0.7838953 0.04978752 0.015166549 0.06632972 0.012106201
5     5 0.06929365 0.7693051 0.05126744 0.008662209 0.05018380 0.005967808
6     6 0.06422188 0.7988891 0.04638031 0.010252374 0.05087166 0.007242793
7     7 0.06347942 0.8033827 0.04585758 0.010594007 0.05415636 0.007976196
      Model       AIC Pseudo_R_Squared
1  gy_logit -798.5518        0.8262585
2 gy_logit1 -791.6053        0.8198808
3 gy_logit2 -727.0029        0.7706337
4 gy_logit3 -791.7938        0.8229096
5 gy_logit4 -799.3962        0.8250244
6 gy_logit5 -792.9895        0.8202429
7 gy_logit6 -794.4413        0.8228542
8 gy_logit7 -798.5518        0.8262585

Conclusion

References